Slow network renders

jason.pickering · April 18, 2024, 2:39pm

At my office we have networked 11 PCs together as our network rendering system. The issue I am having is one PC is always super slow. So when tasks are assigned one machine will take much much longer to process (Sometimes it stuck at Sending Results and sometimes its stuck at Rendering) Which actually means if you send 4 renders through, the first will be caught on the slow machine while the other 3 renders will finish much faster. as a comparison I sent the same render through 4 times and the times were 5:40, 1:10, 1:00, and 1:10. Any idea what might be causing this? I am wondering if its a network issue or maybe something to do with the Keyshot network render process.

Other Notes:

It is not always the same PC, but it usually is the same PC for a while. I can look and say oh so this will be the slow machine today.
it has happened on both CPU and GPU rendering. GPU is faster but its still a longer time.

as a final question. I rendered it using only my machine and it only took 66 seconds on my machine. We may be dropping the network renderer if its just faster to render solo.

marco.wodarz · April 18, 2024, 5:51pm

Why use a slow worker (machine) as a bottleneck in the NR setup? Then just work with 10 machines and don’t use all the NR cores!
See the NR manual for details: https://manuals.keyshot.com/keyshot2024/manual/best-practices-cpu.html

And when you use a previous version of KeyShot, the Network Rendering Manager / Worker versions can be newer, to use all benefits of the latest version.

jason.pickering · April 22, 2024, 8:06pm

the problem is there is always a slow machine. If I have 10 machines (A-J) and computer A is slow. I can turn it off, but now I have 9 machines and B has started to bottleneck. If I turn A back on, it will run fine, but B will be our Bottleneck. If I turn off A and B I am down to 8 and computer C is now bottlenecking. There is always one causing the issue. the only time there is no issue, is if the bottlenecked PC is rendering one of its own jobs.

florian.korsakissok · April 25, 2024, 12:25pm

Hi Jason,

That is a very surprising behavior.

First off, you seem to say that if you send several jobs in parallel to network rendering, they get processed in parallel by your workers? Normally the manager will send all the tasks of the job at the top of the list first, and only when they are all either being processed or done, will it start sending tasks for another job. How many total tasks do your jobs have? You can see that in the “Tasks” column in the monitor.

Second, it could be useful to try and monitor performance metrics on the machine that is slow, both while idle and while processing a job: CPU usage, memory usage, disk usage, network usage. If on Windows, you can find all this information in the Task Manager.

Then I have a couple more questions to try to understand where the issue could come from:

Which version of network rendering are you using?
Are you using a floating license server or a node-locked license?
Do all workers have the same operating system, and which one?

As a side note, there are some performance improvements coming in network rendering in the next version of KeyShot, but unfortunately there is little chance it will solve your particular issue, as I don’t understand where it comes from to begin with. Still, outside of your issue, those improvements should speed up renders when using low sample counts (< 500).

florian.korsakissok · April 25, 2024, 12:30pm

Additional question: how many cores do you have in your license, and how many cores do you have in all your workers?

If you have less cores in your license than the available cores in all your machines summed up, this scenario could totally happen. The manager will only send out tasks for the number of cores in your license, thus not using one of your machines to the full extent.

jason.pickering · April 26, 2024, 5:45pm

On average each job is 9 or 10 tasks. All PCs are currently running windows 10 and Keyshot 2023 network render. We have not had the time to upgrade our manager machine to the new 2024 version. Each machine has a subscription we are not using a floating license. We have 264, but since we are using GPU it counts each card as only 16 so we are currently using only 172.

florian.korsakissok · April 29, 2024, 6:57am

On average each job is 9 or 10 tasks.

Could you check if you have the same problem when you send a job that has more than 11 tasks? To do so, just send out a render with a large resolution, for example 4K, and a high amount of samples, like 2000. There will be much more tasks than 10, and each will take a long time, leaving you ample time to check if a given worker is indeed slower at rendering than the others.

Each task is an atomic piece of the render, meaning it cannot be split up between workers. So if you have a job with 10 tasks, but 11 workers, it is expected that only 10 of them will be doing something, and one of them will be idle (it can’t get a task).

Actually, the last task in a job is done by the manager if I’m not mistaken, so it’s even less than that: for a 10-task job, only 9 workers will run concurrently, and 2 will be idle.